Skip to content

Conversation

@jkukul
Copy link
Contributor

@jkukul jkukul commented Jul 10, 2016

Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding jira.

This patch simply ignores extension fields in protobuf messages.

In the longer run, I'd like to add a proper support for Protobuf extensions. This might take a little longer though, so I've decided to improve the current situation with this patch.

// they have to be ignored.
continue;
}

Copy link
Contributor Author

@jkukul jkukul Jul 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be throw an Exception here and explicitly do not support messages with extensions.

I could also log a warning here, somehow only once though, to avoid the noise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach of throwing an exception. It is easy to add support for a special case later. It is harder to change behavior as it breaks existing applications.
For example in Avro we explicitly throw in the case of recursive schemas.

@julienledem
Copy link
Member

@lukasnalezenec what do you think?

@lukasnalezenec
Copy link
Contributor

Exception is good idea but I am not sure if it is acceptable change of behaviour.
We should at least add some logging there.
How about making it configurable ?

jkukul added 3 commits July 21, 2016 20:11
Extension fields can have overlapping field indexes with base fields.

Writing extension fields can result in unexpected data corruption or an
error.
@jkukul
Copy link
Contributor Author

jkukul commented Jul 21, 2016

@lukasnalezenec I adjusted the PR to throw an exception in 27580ab. I also think this is a better solution, after some consideration.

The PR doesn't introduce any change in behaviour. An informative exception is thrown, instead of a confusing one which would follow otherwise.


fieldWriters[i] = writer;
i++;
fieldWriters[fieldDescriptor.getIndex()] = writer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the same change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fields in the list over which we're iterating are ordered by index. Here's how the list is built:
https://github.com/google/protobuf/blob/master/java/core/src/main/java/com/google/protobuf/Descriptors.java#L826

@julienledem
Copy link
Member

this looks good to me.
@lukasnalezenec ?

@asfgit asfgit closed this in e54ca61 Sep 8, 2016
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding [jira](https://issues.apache.org/jira/browse/PARQUET-660).

This patch simply ignores extension fields in protobuf messages.

In the longer run, I'd like to add a proper support for Protobuf extensions. This might take a little longer though, so I've decided to improve the current situation with this patch.

Author: Jakub Kukul <jakub.kukul@gmail.com>

Closes apache#351 from jkukul/master and squashes the following commits:

27580ab [Jakub Kukul] PARQUET-660: Throw Unsupported exception for messages with extensions.
db6e08b [Jakub Kukul] PARQUET-660: Refactor: Don't use additional variable for indexing fieldWriters.
e910a8a [Jakub Kukul] PARQUET-660: Refactor: Add missing indentation.
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jan 6, 2017
Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding [jira](https://issues.apache.org/jira/browse/PARQUET-660).

This patch simply ignores extension fields in protobuf messages.

In the longer run, I'd like to add a proper support for Protobuf extensions. This might take a little longer though, so I've decided to improve the current situation with this patch.

Author: Jakub Kukul <jakub.kukul@gmail.com>

Closes apache#351 from jkukul/master and squashes the following commits:

27580ab [Jakub Kukul] PARQUET-660: Throw Unsupported exception for messages with extensions.
db6e08b [Jakub Kukul] PARQUET-660: Refactor: Don't use additional variable for indexing fieldWriters.
e910a8a [Jakub Kukul] PARQUET-660: Refactor: Add missing indentation.
rdblue pushed a commit to rdblue/parquet-mr that referenced this pull request Jan 10, 2017
Currently, converting protobuf messages with extension can result in an uninformative error or a data corruption. A more detailed explanation in the corresponding [jira](https://issues.apache.org/jira/browse/PARQUET-660).

This patch simply ignores extension fields in protobuf messages.

In the longer run, I'd like to add a proper support for Protobuf extensions. This might take a little longer though, so I've decided to improve the current situation with this patch.

Author: Jakub Kukul <jakub.kukul@gmail.com>

Closes apache#351 from jkukul/master and squashes the following commits:

27580ab [Jakub Kukul] PARQUET-660: Throw Unsupported exception for messages with extensions.
db6e08b [Jakub Kukul] PARQUET-660: Refactor: Don't use additional variable for indexing fieldWriters.
e910a8a [Jakub Kukul] PARQUET-660: Refactor: Add missing indentation.
@schrepfler
Copy link

Is there any movement on this item and get support of serialising protobuffer extensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants